Character sets of strings

نویسندگان

  • Gilles Didier
  • Thomas Schmidt
  • Jens Stoye
  • Dekel Tsur
چکیده

Given a string S over a finite alphabet Σ, the character set (also called the fingerprint) of a substring S′ of S is the subset C ⊆ Σ of the symbols occurring in S′. The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or comparative genomics. Several queries about the character sets of a string arise from these applications, especially: (1) Output all the maximal locations of substrings having a given character set. (2) Output for each character set C occurring in a given string (or a given collection of strings) all the maximal locations of C. Denoting by n the total length of the considered string or collection of strings, we solve the first problem in Θ(n) time using Θ(n) space. We present two algorithms solving the second problem. The first one runs in Θ(n2) time using Θ(n) space. The second algorithm has Θ(n|Σ| log |Σ|) time and Θ(n) space complexity and is an adaptation of an algorithm by Amir et al. (J. Discr. Alg., 26:1–13, 2003).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of Character Strings from House Maps

In this paper, we propose an experimental extmction method of character strings from house map images, using tlie block information. Our method consists of two steps: the first is to recognize the block information, and the second is to extract character strings with respect to the recognized block i n f o n a tion. In comparison with urban maps, which have often been investigated for extractio...

متن کامل

Finite state intensional semantics

Suppose possible worlds are strings, rather than physically structured worlds like ours. Then the proposition corresponding to a sentence or a formula in logical language is a set of strings; an epistemic acquaintance relation is a relation between strings; and in a relational construction of partition semantics for questions, a question meaning is a relation between strings. If discourse refer...

متن کامل

An example of design optimization for high evolvability: string rewriting grammar.

As an example of the optimization of an evolutionary system design, a string rewriting system is studied. A set of rewriting rules that defines the growth of a string is experimentarily optimized in terms of maximizing the 'replicative capacity', that is the occurrence ratio of self-replicating strings. It is shown that the most optimized rule set allows many strings to self-replicate by using ...

متن کامل

String Distances and Uniformities

The Levenstein or edit distance was developed as a metric for calculating distances between character strings. We are looking at weighting the different edit operations (insertion, deletion, substitution) to obtain different types of classifications of sets of strings. As a more general and less constrained approach we introduce topological notions and in particular uniformities.

متن کامل

Distance Based Indexing for String Proximity Search

In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Discrete Algorithms

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2007